Goto

Collaborating Authors

 approximation theorem


Theory of periodic convolutional neural network

arXiv.org Artificial Intelligence

We introduce a novel convolutional neural network architecture, termed the \emph{periodic CNN}, which incorporates periodic boundary conditions into the convolutional layers. Our main theoretical contribution is a rigorous approximation theorem: periodic CNNs can approximate ridge functions depending on $d-1$ linear variables in a $d$-dimensional input space, while such approximation is impossible in lower-dimensional ridge settings ($d-2$ or fewer variables). This result establishes a sharp characterization of the expressive power of periodic CNNs. Beyond the theory, our findings suggest that periodic CNNs are particularly well-suited for problems where data naturally admits a ridge-like structure of high intrinsic dimension, such as image analysis on wrapped domains, physics-informed learning, and materials science. The work thus both expands the mathematical foundation of CNN approximation theory and highlights a class of architectures with surprising and practically relevant approximation capabilities.


Distributionally robust approximation property of neural networks

arXiv.org Machine Learning

The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond the traditional $L^p$-setting. The covered classes of neural networks include widely used architectures like feedforward neural networks with non-polynomial activation functions, deep narrow networks with ReLU activation functions and functional input neural networks.


Algorithms and data structures for automatic precision estimation of neural networks

arXiv.org Artificial Intelligence

We describe algorithms and data structures to extend a neural network library with automatic precision estimation for floating point computations. We also discuss conditions to make estimations exact and preserve high computation performance of neural networks training and inference. Numerical experiments show the consequences of significant precision loss for particular values such as inference, gradients and deviations from mathematically predicted behavior. It turns out that almost any neural network accumulates computational inaccuracies. As a result, its behavior does not coincide with predicted by the mathematical model of neural network. This shows that tracking of computational inaccuracies is important for reliability of inference, training and interpretability of results.



Quantitative Approximation for Neural Operators in Nonlinear Parabolic Equations

arXiv.org Machine Learning

Neural operators serve as universal approximators for general continuous operators. In this paper, we derive the approximation rate of solution operators for the nonlinear parabolic partial differential equations (PDEs), contributing to the quantitative approximation theorem for solution operators of nonlinear PDEs. Our results show that neural operators can efficiently approximate these solution operators without the exponential growth in model complexity, thus strengthening the theoretical foundation of neural operators. A key insight in our proof is to transfer PDEs into the corresponding integral equations via Duahamel's principle, and to leverage the similarity between neural operators and Picard's iteration, a classical algorithm for solving PDEs. This approach is potentially generalizable beyond parabolic PDEs to a range of other equations, including the Navier-Stokes equation, nonlinear Schr\"odinger equations and nonlinear wave equations, which can be solved by Picard's iteration.


Cauchy activation function and XNet

arXiv.org Artificial Intelligence

In today's scientific exploration, the rise of computational technology has marked a significant turning point. Traditional methods of theory and experimentation are now complemented by advanced computational tools that tackle the complexity of real-world systems. Machine learning, particularly deep neural networks, has led to breakthroughs in fields like image processing and language understanding [3, 7], and its application to scientific problems-such as predicting protein structures [9, 10] or forecasting weather [13]-demonstrates its potential to revolutionize our approach. One of the primary challenges in computational mathematics and artificial intelligence (AI) lies in determining the most appropriate function to accurately model a given dataset. In machine learning, the objective is to leverage such functions for predictive purposes. Traditional methods rely on predetermined classes of functions, such as polynomials or Fourier series, which, though simple and computationally manageable, may limit the flexibility and accuracy of the fit. In contrast, modern deep learning neural networks primarily employ locally linear functions with nonlinear activations.


Universal approximation theorem for neural networks with inputs from a topological vector space

arXiv.org Artificial Intelligence

We study feedforward neural networks with inputs from a topological vector space (TVS-FNNs). Unlike traditional feedforward neural networks, TVS-FNNs can process a broader range of inputs, including sequences, matrices, functions and more. We prove a universal approximation theorem for TVS-FNNs, which demonstrates their capacity to approximate any continuous function defined on this expanded input space.


Deep Neural Networks: Multi-Classification and Universal Approximation

arXiv.org Machine Learning

We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements in $\mathbb{R}^d$, where $d\ge1,$ and $M$ classes, thereby ensuring accurate classification. By modeling the neural network as a time-discrete nonlinear dynamical system, we interpret the memorization property as a problem of simultaneous or ensemble controllability. This problem is addressed by constructing the network parameters inductively and explicitly, bypassing the need for training or solving any optimization problem. Additionally, we establish that such a network can achieve universal approximation in $L^p(\Omega;\mathbb{R}_+)$, where $\Omega$ is a bounded subset of $\mathbb{R}^d$ and $p\in[1,\infty)$, using a ReLU deep neural network with a width of $d+1$. We also provide depth estimates for approximating $W^{1,p}$ functions and width estimates for approximating $L^p(\Omega;\mathbb{R}^m)$ for $m\geq1$. Our proofs are constructive, offering explicit values for the biases and weights involved.


Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach

arXiv.org Artificial Intelligence

Predicting the conditional evolution of Volterra processes with stochastic volatility is a crucial challenge in mathematical finance. While deep neural network models offer promise in approximating the conditional law of such processes, their effectiveness is hindered by the curse of dimensionality caused by the infinite dimensionality and non-smooth nature of these problems. To address this, we propose a two-step solution. Firstly, we develop a stable dimension reduction technique, projecting the law of a reasonably broad class of Volterra process onto a low-dimensional statistical manifold of non-positive sectional curvature. Next, we introduce a sequentially deep learning model tailored to the manifold's geometry, which we show can approximate the projected conditional law of the Volterra process. Our model leverages an auxiliary hypernetwork to dynamically update its internal parameters, allowing it to encode non-stationary dynamics of the Volterra process, and it can be interpreted as a gating mechanism in a mixture of expert models where each expert is specialized at a specific point in time. Our hypernetwork further allows us to achieve approximation rates that would seemingly only be possible with very large networks.


Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

arXiv.org Artificial Intelligence

We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs). This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure - a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments. The code has been released in https://github.com/radarFudan/Curse-of-memory